

International Journal of Computing Communication and Information System(IJCCIS)

Vol 6. No.1 – Jan-March 2014 Pp.72-77

©gopalax Journals, Singapore available at: www.ijcns.com

ISSN: 0976-1349

# DESIGN OF MULTILEVEL TWO DIMENSIONAL-DISCRETE WAVELET TRANSFORM FOR IMAGE PROCESSING APPLICATIONS

## J.Sarala<sup>1</sup>, Mr.E.Sivanantham<sup>2</sup>

<sup>1</sup>Post Graduate Student, <sup>2</sup>Associate professor Department of Electronics and Communication, Dhanalakshmi Srinivasan College of Engineering and Technology, Affiliated to Anna University, Chennai-25 <sup>1</sup> sarala.jayakumar@ymail.com, <sup>2</sup> sivaesiva@gmail.com

### **ABSTRACT**

The design of multilevel two dimensional-discrete wavelet transform (2D-DWT) with novel Vedic multiplier is presented. To design a 2D-DWT, digital finite impulse response (FIR) filter is used to increase the image resolution and eliminate the unwanted noise present in the image. Conventional three-level 2D-DWT is designed using regular Vedic multiplier. But it consumes more area and power. And also less image resolution. So multiplier-accumulate unit (MAC) unit in the FIR filter is changed to design the efficient FIR filter. Conventional FIR filter with regular Vedic multiplier is not working, when the carry input is 1. To overcome this fault, novel Vedic multiplier is proposed and designed using less half adder and full adder. The four level 2D-DWT is applied to increase the image resolution. Simulation is carried out using matrix laboratory (Matlab2008a) and Modelsim6.3c. Synthesis and Implementation is carried out using Xilinx and field programmable gate array (FPGA) Spartan3.

**Index Terms**— Two dimensional discrete wavelet transform (2D-DWT), Vedic multiplier, multiplier-accumulate unit (MAC), finite impulse response (FIR).

### 1. INTRODUCTION

Multilevel two dimensional-discrete wavelet transform (2D-DWT) is used for digital signal processing (DSP) and image processing applications. Lifting based one dimensional- discrete wavelet transform (1D-DWT) provides less image resolution and processing time is also high. The wavelet transformation is an extensively used technique for image processing applications. Unlike traditional transforms such as the fast fourier transform (FFT) and discrete cosine transform (DCT), the discrete wavelet transform (DWT) holds both time and frequency data, based on a multi-resolution analysis technique. This is a powerful approach to signal processing and analysis. As its name implies, multiresolution theory is concerned representation and analysis of signals or images at more than one resolution. This facilitates improved quality of reconstructed picture for the same compression than is possible by other transforms. In order to implement the real time codec based on DWT, it needs to be targeted on a fast device. Field programmable gate array (FPGA) implementation of DWT results in higher processing speed and lower costs when compared to other implementations such as advanced RISC (reduced instruction set computer) machines (ARM) processors, DSPs etc. The Discrete wavelet transform is therefore increasingly used for image coding [4].

This is because the DWT can decompose the signals into different sub-bands with both time and frequency information and facilitate to arrive a high compression ratio. It supports features like progressive image transmission (by quality, by resolution), ease of compressed image manipulation, region of interest coding, etc. The joint photographic experts group (JPEG) 2000 incorporates the DWT into its standard. Recently, several very large scale integration (VLSI) architectures have been proposed to realize single chip designs for DWT. Traditionally, algorithms were implemented programmable DSP chips for low-rate applications or VLSI application specific integrated circuits (ASICs) for higher rates. To perform the convolution, a fast

Vol 6. No.1 – Jan-March 2014 Pp.72-77 ©gopalax Journals, Singapore

> available at: www.ijcns.com ISSN: 0976-1349

multiplier is required, which is crucial in making the operations efficient [5].

### 2. LIFTING BASED DWT SCHEME

The top level structural design for 1D-DWT is presented in Fig. 1a and Fig. 1b [10]. Input X is decomposed into several sub bands of low frequency and high frequency components to extract the detailed parameters from X using many stages of low pass filter (LPF) and high pass filters (HPF). The sub band filters are symmetric and satisfy orthogonal property. For an input image, the two 1D-DWT computations are carried out in the flat and perpendicular directions to compute the two level decomposition. The inverse DWT process combines the decomposed image sub bands to original signal

The reconstruction of image is possible, owing to the symmetric property and inverse property of low pass and high pass filter coefficients. Input x(n1, n2)is decomposed to four sub-components low-low  $(Y_{LL})$ , low-high  $(Y_{LH})$ , high-low  $(Y_{HL})$ , high-high (Y<sub>HH</sub>). This results in a one level decomposition. The Y<sub>LL</sub> sub-band component is further processed and is decomposed to another four sub-band components, thus forming two-level decomposition. This process is continued as per the design requirements till the requisite quality is obtained. Every stage of DWT requires LPF and HPF filters with down sampling by 2. Lifting based DWT computation is widely being adopted for image decomposition [2].



Figure. 1. Image decomposition (a) one dimensional (b) two dimensional



Figure. 2. Lifting based forward one dimensionaldiscrete wavelet transform (1D-DWT)



Figure. 3. Dyadic decomposition for three decomposition levels

In this work, a modified architecture based on Vedic multiplier is proposed to realize the liftingbased DWT. Lifting scheme is one of the techniques used to realize the DWT architecture.

Lifting scheme is used in order to reduce the number of operations to be performed by half, and filters can be decomposed into further steps in lifting scheme [7]. The memory required and also computation is less in the case of lifting scheme. The implementation of the algorithm is fast and inverse transform is also simple in this method. The block diagram for lifting scheme is shown in Fig. 2 [3].

First the input sequences  $x_i$  are split into even and odd parts, s<sub>i</sub> and d<sub>i</sub>. Next, the two lifting sequences



Vol 6. No.1 – Jan-March 2014 Pp.72-77

©gopalax Journals, Singapore available at: www.ijcns.com

ISSN: 0976-1349

are done by two lifting steps. The outputs are given by  $s_i^n$  and  $d_i^n$ , where n denotes the stage of lifting steps. Finally, through the normalization factors  $k_1$  and  $k_2$ , the low pass and the high filter coefficients  $s_i$  and  $d_i$  can be obtained.  $\alpha$ ,  $\beta$ ,  $\gamma$ , and  $\delta$  are the constants.

1)Splitting step:

$$d_{i}^{0} = x_{2i+1} \tag{1}$$

(2)

2)Lifting step:

(First lifting Step)  

$$d_i^1 = d_i^0 + \alpha x (s_i^0 + s_{i+1}^0)$$
 (predictor)  
 $s_i^1 = s_i^0 + \beta x (d_{i+1}^1 + d_i^1)$  (updater)

(Second Lifting Step)  

$$d_i^2 = d_i^{1} + \gamma x (s_i^{1} + s_{i+1}^{1})$$
 (predictor)  
 $s_i^2 = s_i^{1} + \delta x (d_{i-1}^{2} + d_i^{2})$  (updater)

3)Scaling step:

$$d_i = k_2 x d_i^2 \tag{7}$$

$$s_i = k_1 \times s_i^2 \tag{8}$$

Because the 2D-DWT is a separable transform, it can be computed by applying the 1D-DWT along the rows and columns of the input image of each level during the horizontal and vertical filtering stages. Every time the 1D-DWT is applied on a signal, it decomposes that signal into two sets of coefficients, a low-frequency and a high-frequency set. The low frequency set is an approximation of the input signal at a coarser resolution, while the high-frequency set includes the details that will be used at a later stage during the reconstruction phase.

decomposition Dyadic for three decomposition levels is shown in Fig. 3 [11]. The input of level j is the low-frequency 2D sub band LLj, which is actually the coarse image at the resolution of that level. In the first level, the image itself constitutes the LL image block (LL0). The coefficients L (H), produced after the horizontal filtering at a given level, are vertically filtered to produce sub ands LL and LH (HL and HH). The LL sub band will either be the input of the horizontal filtering stage of the next level, if there is one, or will be stored, if the current level is also the last one. All LH, HL and HH sub bands are stored, to contribute later in the reconstruction of the original image from the LL sub band.

# 3. CONVENTIONAL VEDIC MULTIPLIER FOR MULTILEVEL 2D-DWT

Vedic mathematics is the name given to the ancient Indian system of mathematics that was rediscovered in the early twentieth century. Vedic mathematics is based on sixteen principles or word-formulae which are termed as Sutras. A simple digital multiplier (referred henceforth as Vedic multiplier) architecture based on the Urdhva-Triyakbhyam Sutra is proposed. The block diagram of conventional Vedic multiplier is shown in Fig. 4. Urdhva – Tiryakbhyam is the common formula applicable to all cases of multiplication and also in the division of a huge number by another huge number. It means perpendicularly and diagonally [1], [6], [12].

**Example1:** The product of 1111 and 1111 using Urdhva– Tiryakbhyam (vertically and crosswise) is given below,



Vol 6. No.1 – Jan-March 2014 Pp.72-77 ©gopalax Journals, Singapore

available at: <a href="www.ijcns.com">www.ijcns.com</a>
ISSN: 0976–1349





Figure. 4. Conventional Vedic multiplier

The 16x16 bit Vedic multiplier module is implemented using four 8x8 bit Vedic multiplier modules as shown in fig.4. To get the final product, four 8x8 bit Vedic multiplier and three 16-bit ripplecarry (RC) Adders are required.

# 4. PROPOSED MULTILEVEL 2D-DWT WITH MODIFIED VEDIC MULTIPLIER

In general, the folded formation consists of a pair of 1D-DWT modules (row and column processor) and memory/storage component. The memory module consists of three forms, a frame memory, transposition memory, and temporal memory. Frame memory is necessary to store the low-low sub band for level-by-level computation of multilevel 2D-DWT. Transposition memory stores the intermediate values resulting from the row processor, and temporal memory is used by the column processor to store the partial results. Frame memory may either be on-chip or external, and the other two are on-chip memories. The block diagram of the proposed 2D-DWT architecture is shown in Fig. 5.



Figure. 5. Block diagram of the proposed two dimensional-discrete wavelet transform (2D-DWT) architecture



Figure. 6. Block diagram of modified Vedic Multiplier

The beauty of Vedic mathematics lies in the fact that it reduces the otherwise cumbersome-looking calculations in conventional mathematics to a very easy one. This is due to the fact that the Vedic formulae are claimed to be based on the natural principles on which the human brain works. This is a very essential field and provides some effective algorithms which can be applied to various branches of engineering such as computing and digital signal processing [12].

Firstly, the least significant bits (LSB) are multiplied which gives the LSB of the final product. Then, the LSB of the multiplicand is multiplied with the next higher bit of the multiplier and summed with the product of LSB of multiplier and the next higher bit of the multiplicand. The sum generates the second bit of the final product and the carry is added with the partial product obtained by multiplying the most significant bits to give the sum and carry. The sum is the third bit and carry becomes the fourth bit of the final product.

The 16x16 bit Vedic multiplier module is implemented using four 8x8 bit Vedic multiplier modules as shown in Fig. 6. Let's analyze 16x16 multiplications, say A = A15 A14 A13 A12 A11 A10 A9 A8 A7 A6 A5 A4 A3 A2 A1 A0 and B= B15 B14 B13 B12 B11 B10 B9 B8 B7 B6 B5 B4 B3 B2 B1 B0. The output for the multiplication result is S31 S30 S29 S28 S27 S26 S25 S24 S23 S22 S21 S20 S19 S18 S17 S16 S15 S14 S13 S12 S11 S10 S9 S8 S7 S6 S5 S4 S3 S2 S1 S0 . Lets divide A and B into two

Vol 6. No.1 – Jan-March 2014 Pp.72-77

©gopalax Journals, Singapore available at: www.ijcns.com

ISSN: 0976-1349

parts, say the 16 bit multiplicand A can be decomposed into pair of 8 bits AH-AL. Similarly multiplicand B can be decomposed into BH-BL.

Each block as shown is 8x8 bit Vedic multiplier. First 8x8 bit multiplier inputs are A7 A6 A5 A4 A3 A2 A1 A0 and B7 B6 B5 B4 B3 B2 B1 B0. The last block is 2x2 bit multiplier with inputs A15 A14 A13 A12 A11 A10 A9 A8 and B15 B14 B13 B12 B11 B10 B9 B8. The middle one shows two 8x8 bit multiplier with inputs A15 A14 A13 A12 A11 A10 A9 A8 & B7 B6 B5 B4 B3 B2 B1 B0 and A7 A6 A5 A4 A3 A2 A1 A0 & B15 B14 B13 B12 B11 B10 B9 B8. So the final result of multiplication, which is of 32 bit, S31 S30 S29 S28 S27 S26 S25 S24 S23 S22 S21 S20 S19 S18 S17 S16 S15 S14 S13 S12 S11 S10 S9 S8 S7 S6 S5 S4 S3 S2 S1 S0. To get the final product (S31 S30 S29 S28 S27 S26 S25 S24 S23 S22 S21 S20 S19 S18 S17 S16 S15 S14 S13 S12 S11 S10 S9 S8 S7 S6 S5 S4 S3 S2 S1 S0), four 8x8 bit Vedic multipliers, eight half adders, one full adder and two 16-bit Ripple-Carry (RC) Adders are required.

The modified Vedic multiplier is proposed to reduce the total number of half adder and full adder and rectify the fault, when the carry input is 1. The four-level, 2D-DWT is applied to increase the image resolution than the conventional three-level 2D-DWT and is shown in Fig. 7.

### 5. RESULTS AND DISCUSSIONS

Conventional Vedic multiplier, shows the result in wrong, when 15X11, it gives 105, instead of 165 due to carry output 1 is not processed. To overcome this fault, modified Vedic multiplier is proposed. Simulation is done using Modelsim6.3c. Simulation result of conventional Vedic multiplier is shown in Fig. 8.

Modified Vedic multiplier as shown in Fig. 6. is used to give the exact result, when the carry input is 1. Instead of 16bit ripple carry adders (32 half adder), 8 half adders and one full adder is used in the modified Vedic multiplier to reduce the area and delay than the conventional Vedic multiplier. Simulation result of modified Vedic multiplier is shown in Fig. 9. Performance analysis of conventional and modified MAC Unit for 2D-DWT is shown in Fig. 10.



Figure. 7. Image decomposition output of four level two dimensional-discrete wavelet transform (2D-DWT)



Figure. 8. Simulation result of Conventional Vedic multiplier with fault during carry=1



Figure. 9. Simulation result of proposed Vedic multiplier without fault during carry=1





Vol 6. No.1 – Jan-March 2014 Pp.72-77 ©gopalax Journals, Singapore

available at: <a href="www.ijcns.com">www.ijcns.com</a>
ISSN: 0976–1349

1000 767 723 800 600 430 404 **■ LUT** 400 ■ Slices 200 8.637 0 ■ Delay(ns) Modified Conventional Vedic Vedic Multiplier Multiplier

Figure. 10. Performance analysis of Conventional and modified multiplier-accumulate (MAC) Unit for two dimensional-discrete wavelet transform (2D-DWT)

### 6. CONCLUSION

An efficient multiplier called modified Vedic multiplier has been proposed for multilevel 2D-DWT. The proposed multiplier provides low area and less delay by use of less number of full adders and half adders instead of ripple carry adder. In this paper, hardware design and implementation of FPGA based parallel architecture for 2D-DWT with modified Vedic multipliers is presented. The design was implemented on Xilinx Spartan 3 XC3S50 FPGA device. Comparative study of the multilevel 2D-DWT with regular Vedic multiplier and modified Vedic multiplier was done. The Modified Vedic multiplier as compared to regular Vedic multiplier shows much more reduction in device utilization. The proposed method offers 20% area and 10% power reduction than the existing architecture. Hence it is concluded that, the modified Vedic multiplier based on multilevel 2D-DWT provides an efficient method for reducing the area and delay of MAC unit of the FIR filter for 2D-DWT.

### REFERENCES

- [1] A. Ronisha Prakash and S. Kirubaveni, "Performance evaluation of FFT Processor Using Conventional and Vedic Algorithm," IEEE International Conference on Emerging Trends in Computing, Communication and Nanotechnology (ICECCN 2013), 2013 IEEE.
- [2] Chao-Tsung Huang, Po-Chih Tseng, and Liang-Gee Chen, "Analysis and VLSI Architecture for 1-D and 2-D Discrete Wavelet Transform," IEEE Transactions on Signal Processing, Vol. 53, No. 4, April 2005.
- [3] Bing-Fei Wu, Chung-Fu Lin, "A High Performance and Memory Efficient Pipeline

Architecture for the 5/3 and 9/7 Discrete Wavelet transform of JPEG2000 Codec, "IEEE Transactions on Circuits And Systems For Video Technology, Vol. 15, No. 12, Dec. 2005.

- [4] Chuo-Ling Chang, and Bernd Girod, "Direction-Adaptive Discrete Wavelet Transform for Image Compression," IEEE Transactions On Image Processing, Vol. 16, No. 5, May 2007.
- [5] Jinook Song, and In-Cheol Park, "Pipelined Discrete Wavelet Transform Architecture Scanning Dual Lines," IEEE Transactions on Circuits and Systems—II: Express Briefs, Vol. 56, No. 12, Dec. 2009.
- [6] Pavan Kumar U. C. S and A. Radhika, "FPGA Implementation of high speed 8-bit Vedic multiplier using barrel shifter," 2013 IEEE.
- [7] Quanping Huang, Rongzheng Zhou, and Zhiliang Hong, "Low Memory and Low Complexity VLSI Implementation of JPEG2000 Codec, " IEEE Transactions on Consumer Electronics, Vol. 50, No. 2, May 2004.
- [8] Yeong- Kang Lai, Lien- Fei Chen, and Yui- Chih Shih, "A High-Performance and Memory-Efficient VLSI Architecture with Parallel Scanning Method for 2-D Lifting-Based Discrete Wavelet Transform," IEEE Transactions on Consumer Electronics, Vol. 55, No. 2, May 2009.
- [9] Chengyi Xiong, Jinwen Tian, and Jian Liu, "Efficient Architectures for Two-Dimensional Discrete Wavelet Transform Using Lifting Scheme," IEEE Transactions On Image Processing, Vol. 16, No. 3, March 2007.
- [10] N. Nagabhushanam and S. Ramachandran, "Fast Implementation of Lifting-based 1D/2D/3D DWT-IDWT Architecture for Image Compression," International Journal of Computer Applications, Vol.51, No.6, Aug. 2012.
- [11] Maria E. Angelopoulou and Peter. Y. K. Cheung, "Implementation and Comparison of the 5/3 Lifting 2D Discrete Wavelet Transform Computation Schedules on FPGAs," Journal of VLSI Signal Processing 2007.
- [12] Poornima M, Shivaraj Kumar Patel, Shivukumar, Shridhar K P, Sanjay H, "Implementation of Multiplier Vedic using Algorithm, " International Journal of Innovative technology and Exploring Engineering (IJITEE), ISSN: 2278-3075, Vol-2, Issue -6, May 2013.